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Abstract. - We characterize the low temperature phase of a simple model for RNA secondary 
structures by determining the typical energy scale E(l) of excitations involving I bases. At low 
enough temperatures, including T = 0, we find a scaling law E{1) ~ with a small exponent 
6. Above a critical temperature, there is a different phase characterized by a relatively flat 
free energy landscape resembling that of a homopolymer with a scaling exponent 6 = 1. These 
results strengthen the evidence in favour of the existence of a glass phase at low temperatures. 



Introduction. - The folding of RNA or single stranded DNA as described by its sec- 
ondary structure is both a relevant problem in molecular biology and a challenging task for 
the statistical mechanics of disordered systems. Several authors [0-|6j have recently addressed 
the topic and put forward some evidence for the existence of a glassy phase at low temper- 
atures. Numerical studies of the specific heat demonstrate the existence of a higher order 
phase transition The nature and properties of the low temperature phase are less clear 
because of large finite size corrections. While the overlap distribution is certainly broad for 
systems of up to 1000 bases indicating a kind of glassy phase, its asymptotic behaviour 

for long sequences cannot be deduced reliably from the present simulations . Bundschuh 
and Hwa |^,^ have recently argued in favour of the existence of a glass phase. They showed 
analytically that in the disordered case the asymptotic pre-exponential scaling of the parti- 
tion function cannot be the same as for homogeneous RNA at low temperatures. They also 
showed that the system of two attractively coupled replicas of the same disordered sequence 
exhibits a phase transition from a strongly coupled low temperature phase to a phase at high 
temperatures where the replicas are essentially independent. Both results favour the existence 
of a glass transition at finite temperature, but a rigorous proof is still missing. Numerically, 
the same authors characterize the RNA conformation via the free energy cost of an imposed 
pairing (pinching) of two bases. Concentrating on the largest possible pinching excitations in 
the groundstate of a given RNA sequence they argue in favour of an excitation energy scale 
that grows logarithmically with the number N of bases in the sequence, a weak power law 
not being ruled out. However, it is not clear in what sense the excitations created by a single 
pinch are typical. 

In this paper, we propose a different numerical method to study the scale dependence of 
excitations as was originally introduced in the context of spin glass models 0-^. The aim is 
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to determine the typical free energy scale E{1) of excitations involving the bonds of I bases. 
Rather than changing boundary conditions via pinching, we introduce a perturbation in the 
bulk which allows for a better control of the actual size of excitations. 

We find that the low temperature regime is governed by excitations which scale like E{1) ~ 
where 9 « 0.23 while the high temperature phase behaves like a homopolymer with scaling 
exponent 9 — 1. The change of 9 indicates a phase transition between a liquid- like high 
temperature phase and a strongly correlated glass-like phase at low temperature. In the 
former, excitations consist of independent rearrangements of individual bases, and the paired 
pieces of the polymer can slide along each other, while in the latter the system is locked in 
a favourable secondary structure, and low-lying excitations involve correlated rearrangements 
of bonds. Note that our description is restricted to the thermodynamic and static properties 
of RNA. In particular, within our approach we cannot deal with the dynamics of RNA which 
is expected to be very slow in the low temperature phase pcj]. 



The model. - The RNA-strand is characterized by its (quenched) base sequence, real 
RNA being composed of the four constituents A,C,G and U. The single stranded polymer 
will fold back onto itself to form local double helices of stacked base pairs as found in double 
stranded DNA. The pattern of base pairings is known as the secondary structure of RNA. 
The Watson-Crick base pairs A-U and C-G have the strongest tendency to bind, and in a first 
approximation one can neglect other pairings. In this paper we concentrate on the secondary 
structure without taking into account the three-dimensional ternary structure of the polymer 
whose typical energy scale is significantly lower than that of the base pairing ^ . 

The glass phase which we will describe exists at the level of the secondary structure. One 
might imagine another glass phase appearing at the level of the ternary structure, in the form 
of the possible existence of many metastable spatial arrangements of the molecule, while the 
secondary structure would be unique. We do not address this question here. 

Only two restrictions from the spatial structure are retained: The rigidity of the polymer 
chain is taken into account by requiring a minimal distance |j — j| > s between two bases 
forming a bond to avoid very small loops of the linking strand. Furthermore, pairings of bases 
{i,j} and {k,l} with i < k < j < I are known to be very rare in real RNA for topological 
reasons. We exclude such pseudoknots altogether in this paper as they can be regarded as a 
small perturbation. They could in principle be taken into account systematically, using the 
approach of . 

To simplify the problem even further we consider a variant of the model defined in |Q 
with sequences (6i)i=i,....A' of only two different species, bi ~ A or B, with bond energies 
e{A,A) = e{B,B) = —1 and e{A,B) = e{B,A) = —2, which is expected to capture the 
essential physics of RNA. To avoid an artifact of the two letters model for s = 1 (see we 
choose s = 2 as in earlier work 1^. A natural interpretation of this model is that A and B 
correspond to small subsequences of a real RNA-strand and the energies describe their effective 
pairing affinities. To mimic the fluctuations of the latter we add a noise to all bond energies, 
Gij — ^ 6jj +'7j,i where e^j — e{bi, bj) and rji^j is a uniformly distributed variable in [—1/2, 1/2]. 
This lifts the exact degeneracy and ensures that the ground state is unique, which is certainly 
true in real RNA (or when using more realistic rules for secondary structures energies), and 
is also useful technically. In parallel, we studied models with different base pairing energies, 
and with couplings Cij that are independently drawn from a given continuous distribution 
without reference to a base sequence. The results were qualitatively the same in all models. 

Taking the energy of a secondary structure S to be the sum of pairing energies, H{S) = 
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j}£S J ' partition function of an RNA strand with iV bases is given by 

ZAr = ^exph/3i7(5jv)], (1) 

where the sum extends over all permissible secondary structures iS^v- Let us represent a 
secondary structure S by the set of numbers {pi} ascribing to each base i the base Pi = j 
that it is paired to, or pi = —1 if it is unpaired. We can then define an overlap between two 
secondary structures Sa and Sp (for the same base sequence) as 

1 ^ 

which is normalized to g"" = 1. 

Earlier investigations on very similar models concentrated on the probability distri- 
bution of the overlap, P(g), which was found to exhibit non-selfaverageness, i.e., P{q) depends 
on the realization of the sequence disorder. Such properties are interpreted as signatures JTst 
of a glass phase since they require correlations over large parts of the system, implying a 
divergent correlation length in the low temperature phase. 

However, from the numerical data it was not clear whether or not in the thermodynamic 
limit P{q) tends to a single delta peak, 5{l — q), or to a nontrivial function as in mean field spin 
glasses. The latter would imply the existence of arbitrarily large excitations of energy 0(1) 
and thus a free energy exponent = 0. For positive 9, however, the groundstate dominates at 
sufficiently low temperature and the overlap is peaked at g = 1 in the thermodynamic limit. 
The system can nevertheless possess a glass phase where favourable free energy valleys are 
separated by high barriers, but differ in free energies by terms of order . This is what 
happens for instance in the case of a directed polymer in random media (DPRM) |l4| ] , where 
6' = l/3inl + l dimensions, and we shall show below that the situation is similar in our RNA 
model at low temperatures. (Let us notice that, in the case of special sequences made up of 
A and G in the first half of the sequence and C and U in the second half, the RNA model can 
be mapped onto the DPRM jl^].) In such a case, the e-coupling method described below is a 
choice method to detect the glass phase and to compute 9. 

e-coupling at T = 0. - Our approach allows a rather direct analysis of the low-lying 
energy landscape. The basic idea is to introduce a small perturbation in the bulk which repels 
the system from its groundstate Sq P,M. More precisely, one considers the new Hamiltonian 



H'{S-e)^HiS)+eqiS,So) (3) 

where the repulsive coupling e > will be tuned appropriately with system size. The energy 
of the original groundstate is increased by e whereas every other state is shifted by a smaller 
amount. The groundstate Se of H'(S; e) is a low-lying excitation of the original system and 
has the largest distance 1 — q{Se,So) from the groundstate at the given excitation energy. 

Let us suppose that the typical energy scale of excitations involving the change of the 
pairing of I bases is E{1) ~ , or more precisely, that the disorder-averaged probability 

distribution of energies obeys a scahng law P[E{1)] = . Under the assumption that 

/(O) is finite and 9 <1 one expects the average fraction of bases involved in an excitation to 
scale as 

l-g(5„5o)^ / f{E)dE (4) 
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Fig. 1 - Left: Scaling plot for e-coupling at T = and T = 0.05 (shifted vertically by 0.3) with 
9 = 0.23, giving the distance between the coupled secondary structures versus the strength of the 
repulsion in reduced units. Right: Scaling plot at T=0.25 with 9 = 1. 



since large scale excitations of the order of the system size N dominate the disorder average 
(indicated by the overbar). Note that 9 < 1 implies that, for fixed I, excitations composed of 
several independent 'elementary' excitations are usually higher in energy so that the contri- 
bution of 'elementary' excitations dominates. As we shall see, the data fits well to the more 
general dependence 

l-g(5„5o)=5W<i>(^), (5) 

where g{N) has been introduced to account for finite size effects and is subject to two 
boundary conditions: For e 3> 3> 1, g vanishes, and thus g{N oo) = l/<i>(oo) = 1. On 
the other hand, for e ^ and small N one has to recover a behaviour linear in e/N^ which 
implies that g{N ^ 0) ^ const, and <i> behaves linearly. 

The algorithmic implementation of e-coupling is rather straightforward. The ground state 
of a particular realization of Cij can be found recursively in 0{N^) time [|l|, 11 . The new 
Hamiltonian (|^) is equivalent, up to an irrelevant constant, to a change of the bond energies 
and the new groundstate can be found in the same way as before. We have used 100 to 1000 
samples for sizes ranging from 100 to 2800 and with e ranging from 0.2 to 10.0. In Fig.0 we 
have recast our data into a scaling plot using the trial function g{N) = 1 -I- c/(l -I- N/N*) for 
the finite size effects. We found an optimal value 9 = 0.23±0.05 for the energy exponent while 
typical values for the parameters of g{N) are c ~ 0.6 and N* « 300 — 1000. We also tried 
to collapse the curves with scaling variables e/(j){N) where (j>{N) = \og{N/N*) as suggested 
in H, or <j){N) = 1 - N*/N + c{N* /Nf. Both possibilties could not be ruled out completely 
with the present data, however, the power law leads to better values of . When the range of 
1 — (? is restricted to small values (< 0.4), the finite size corrections gh{N) can be neglected, 
and the fit yields values of « 0.35 as in Ref. p7[ |. 

A better understanding of the finite size corrections would thus be needed in order to 
determine the precise value of the 9 exponent. At the present stage, taking into account the 
uncertainty about systematic errors related to the choice of fitting schemes, our data favour 
a value of 9 in the range 0.15 — 0.40, but we cannot rule out a scenario with 9 = 0. For 
models with different pairing energies we obtained similar results, favouring a power law for 
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the scaling of excitation energies, with exponents in the above-mentioned range of uncertainty. 

e-coupling at finite temperature. - Couphng two copies of the system can also be used 
in its original form at finite temperature We consider two RNA strands with the same 

random sequence coupled by the Hamiltonian 

iItot(5i,52;e) = if(5i) + ff(52) + eg(5i,52). (6) 

The thermal average of the overlap as a function of e is given by 

, Y.s^.s.<l^Sl,S2)eM-|3Htot{Sl,S2■,t)) ^ (g(5i,52)exp(-/3eg(5i,52)))i,2 , . 

Here, () denotes the average over the Gibbsian ensemble of the coupled two replica system 
while ()i_2 denotes the average with independent Boltzmann weights for the two replicas. Note 
that the partition function corresponding to the coupled Hamiltonian m) is simply related to 
the Laplace transform of P{q) with respect to /3e. We can rewrite Eq. ^) as 

^'^^^'^ ^ I ^Eexp[-/3i7(52) - f3eq{S„S2)]^ ^ ^ -^^Jn [{Z2{e;S,)),] . (8) 

where the average ()i extends only over the first replica. The averages over the secondary 
structure of a single RNA-strand are calculated by sampling the Gibbs ensemble as in [Q : The 
partition functions Zi,j corresponding to connected substrands {i, j} are calculated recursively 
and can then be used to generate secondary structures with their corresponding Boltzmann 
weight. Since the coupling is not expected to alter the single sequence statistics significantly, 
we approximated the righthand side of Eq. (||) by calculating an average over 20 randomly 
sampled structures Si for each of which Z2{e;Si) is calculated exactly. We verified that the 
results of more extensive samplings lie within the error bars of the disorder average. 

In Fig. 1^ we plot the data obtained for T — 0.05 and T = 0.25. The data collapse works 
best with a free energy exponent 9 = 0.2 — 0.3 at T = 0.05 which coincides with the zero 
temperature exponent. For T = 0.25 we have to choose 6* = 1 to superpose all the curves. 
This is the same exponent that one finds in the case of homogeneous RNA as is shown in 
the next paragraph. = 1 can no longer be interpreted as a free energy exponent at high 
temperatures. It can be understood in terms of a flat free energy landscape and uncorrelated 
behaviour of individual bases. A study of the homogeneous case is instructive in this respect. 

e-coupling in the homogeneous case. - Let us consider the problem of RNA with homo- 
geneous base pairing energy e^j = e. In this case it is possible to calculate analytically the 
asymptotic (N — > cxi) limit of the partition function of two coupled replicas with Hamiltonian 

in close analogy to Q : For each configuration of the two replicas we determine the com- 
mon bonds and the bases which are unpaired in both sequences. The contribution of these 
common elements is split into a connected part that vanishes for e = 0, and a disconnected 
part corresponding to uncoupled strands. Thus, the contribution of a configuration with n 
common elements (bonds or unpaired bases) is split into a sum of 2" terms. In each such term 
we determine the m (connected) common bonds and the / common unpaired bases which are 
not embraced by a connected common bond. Summing over all possibilities to arrange those 
m + f connected common elements on the RNA strand we arrive at a recursion relation for 
the total partition function which reads 

(^) = E E E ( " ^ / ) ^1 - E ^« - f)9l n - 2)] , (9) 



Ztot 

f>0 m>0 li...l„^>2 
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where gu = e~^'^^^ — 1 and gd — e'~^^'^{e~^^'^^^ — 1) are the connected couphngs. The 
interior of each connected common bond of length li is treated as a two repHca system with 
replaced by k — 2. Note that here we use s — 1 for the smallest permissible loop. Zi{N) 
is the partition function of one replica with N bases. The factor Zf{N ~ — f) arises 
from all bonds that can be distributed on the remaining free part of the strands outside the 
m connected common bonds, excluding the / connected unpaired bases. The combinatorial 
factor counts the possibilities to align m connected common bonds, / unpaired common bases 
and N — ~ f free bases. Of course, we have to require + / — ^• 

Let us fix the values of g^ and g^ for a moment and introduce the generating function of 
Z-tot for which we can derive a recursion relation using (j9|), 

s(c) = Z,,,iN)C^ = -i-- -S2 (- -i-- -) . (10) 

Here, 52(C) = J2'n=o Z1{N)C,^ denotes the generating function of Zf{N). Zi{N) satisfies the 
recursion relation, 

JV-2 

Zi{N) ^Zi{N-l)+Y, 9oZi{k)Zi{N -k-2), (11) 
with go = e~^'^, and the corresponding generating function Si(C) can be obtained explicitly. 



^ 1-C- V(l-C)^-4goC^ 

Wio ■ ^''^ 

The behaviour of Zi{N) for large N can be derived from this expression by inverse Laplace 
transform, see g^,|l^. Asymptotically, one finds Zi{N) = cCf^/A^^/^, where Ci = (1 + 
2^/go)~^ is the smallest singularity of Si(C). Similarly, the smallest singularity C* of S(C) 
determines the leading behaviour of Ztot- One can check that, since gd is negative, C* is always 
determined by the singularity of S2 on the RHS of Eq. (p^), i.e., C*/[l— 5dC*5(C*)— 5«C*] = Cij 
or explicitly, 

C* =Ci/[l + 5dC?S(Ci)+5uCi]- (13) 

Asymptotically, the two replica partition function behaves as Ztot(A^) = c-^^. We can now 
calculate ((z)(e) by taking the logarithmic derivative of Ztot{N) recalling the dependence of g^ 
and gd on e/iV, 

OlnZtot(iV) ^^NdlnC. _ 2ggC,^S2(Ci)e-^^-/^ + Q.e-P^'" 
' Pde ~ P de l+5.C?S2(Ci)+5uCi ' ^ ' 

From Eq. (|lj) it is clear that (q) = q{e/N) and thus 6' = 1 in the homopolymer. We can 
understand this scaling behaviour as a lack of long-range correlations in the system (A simple 
example of such a behaviour is the case of N uncoupled Ising spins in a random field under 
e-coupling). In order to affect the overlap significantly one has to introduce an extensive 
perturbation which corresponds to a finite force acting on each base and typically leads to a 
large number of independent local rearrangements. 

Phase transition. - We will now use the scaling exponent 9 to define an order parameter 
for the presumed phase transition in the disordered system. Consider the effect of a small 
extensive coupling between two replicas, i.e., e = SN. In a phase with 9 < 1 where typical 
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excitation energies are subextensive {o{N)), the coupling always dominates in the thermo- 
dynamic limit the replicas will have no overlap (q = 0) for a repulsive coupling d > while 
they are locked together {q = 1) when the coupling is attractive {5 < 0). However, ii 6 = 1 
an extensive coupling is a marginal perturbation. In the homogeneous case we can directly 
check from Eq. ( [l^ ) that small perturbations have a negligible effect, and (q) is continuous 
as S ^ 0±. From the numerical data in Fig. ^ we see that the same is true in the high 
temperature phase of disordered sequences which leads us to define the order parameter 



For a homopolymer </> = at all temperatures. For disordered sequences (f> jumps discontinu- 
ously at the critical temperature from in the high temperature phase to 9max~<Zmin = 1 in the 
glassy phase. This provides a precise mathematical definition of the transition temperature. 

Conclusion. - We have studied a simple model for the folding of RNA, taking as a 
probe the susceptibility to the introduction of a repulsion between two identical clones of the 
system. Our method clearly distinguishes a low temperature glassy regime, identical to the 
zero temperature case, where the excitation energies scale with a power law of the length, with 
a small scaling exponent. On the contrary the high temperature liquid- like phase is found to 
behave similarly to a homopolymer, both having a relatively flat free energy landscape and no 
long-range correlations in the base pairing pattern. The result for the exponent 9 gives way to 
the definition of an order parameter which jumps discontinuously at the critical temperature. 
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